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Discrimination of two channels by adaptive 
methods and its application to quantum system 

Masahito Hayashi 
Abstract 

The optimal exponential error rate for adaptive discrimination of two channels is discussed. In this problem, adaptive choice 
of input signal is allowed. This problem is discussed in various settings. It is proved that adaptive choice does not improve the 
exponential error rate in these settings. These results are applied to quantum state discrimination. 

Index Terms 

Simple hypothesis testing, Channel, Discrimination, Quantum state, One-way LOCC, Active learning, Experimental design, 
Stein's lemma, Chernoff bound, Hoeffding bound, Han-Kobayashi bound 

I. Introduction 

DISCRIMINATING two distributions is treated as a fundamental problem in the field of statistical inference. This problem 
can be regarded as simple hypothesis testing because both hypotheses consist of a single distribution. Many researchers, 
Stein, Chernoff [3], Hoeffding [16], and Han-Kobayashi [10] have studied the asymptotic behavior when the number n of 
identical and independent observations is sufficiently large. They formulated a simple hypothesis testing/discrimination of 
two distributions as an optimization problem and derived the respective optimum value, e.g., the optimal exponential error 
rate. We call these optimum values the Stein bound, the Chernoff bound, the Hoeffding bound, and the Han-Kobayashi bound, 
respectively. Han [8], [9] later extended these results to the discrimination of two general sequences of distributions, including 
the Markovian case. Nagaoka-Hayashi [21] simplified Han's discussion and generalized Han's extension of the Han-Kobayashi 
bound. 

In the present paper, we consider another extension of the above results. That is, we extend the above results to the 
discrimination of two (classical) channels, in which two probabilistic transition matrices are given. Such a problem has appeared 
in Blahut[2]. In this problem, the number of applications of this channel is fixed to a given constant n, and we can choose 
appropriate inputs for this purpose. In this case, we assume that the given channel is memoryless. If we use the same input to 
all applications of the given channel, the n output data obeys an identical and independent distribution. This property holds 
even if we choose the input randomly based on the same distribution on input signals. This strategy is called the non-adaptive 
method. In particular, when the same input is applied to all channels, it is called the deterministic non-adaptive method. If 
the input is determined stochastically, it is called the stochastic non-adaptive method, which was treated by Blahut[2]. In the 
non-adaptive method, our task is choosing the optimal input for distinguishing two channels most efficiently. In the present 
paper, we assume that we can choose the fc-th input signal based on the preceding k — 1 output data. This strategy is called the 
adaptive method, which is the main focus of the present paper. In the parameter estimation, such an adaptive method improves 
estimation performance. That is, in the one-parameter estimation, the asymptotic estimation error is bounded by the inverse 
of the optimum Fisher information. However, if we do not apply the adaptive method, it is generally impossible to realize 
the optimum Fisher information in all points at the same time. It is known that the adaptive method realizes the optimum 
Fisher information in all points [13], [7]. Therefore, one may expect that the adaptive method improves the performance of 
discriminating two channels. 

As our main result, we succeeded in proving that the adaptive method cannot improve the non-adaptive method in the 
sense of all of the above mentioned bounds, i.e., the Stein bound, the Chernoff bound, the Hoeffding bound, and the Han- 
Kobayashi bound. That is, there is no difference between the non-adaptive method and the adaptive method in these asymptotic 
formulations. Indeed, as is proven herein, the deterministic non-adaptive method gives the optimum performance with respect 
to the Stein bound, the Chernoff bound, and the Hoeffding bound. However, in order to attain the Han-Kobayashi bound, in 
general, we need the stochastic non-adaptive method. 

On the other hand, the research field in quantum information has treated the discrimination of two quantum states. Hiai- 
Petz[15] and Ogawa-Nagaoka[18] proved the quantum version of Stein's lemma. Audenaert et al. [1] and Nussbaum-Szkola 
[23], [24] obtained the quantum version of the Chernoff bound. 

Ogawa-Hayashi [17] derived a lower bound of the quantum version of the Hoeffding bound. Later, Hayashi [12] and 
Nagaoka [20] obtained its tight bound based on the results by Audenaert et al. [1] and Nussbaum-Szkola [23], [24]. Hayashi 
[11] (in p. 90) obtained the quantum version of the Han-Kobayashi bound based on Nagaoka[19]'s discussion. These discussions 
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assume that any measurement on the n-tensor product system is allowed for testing the given state. Hence, the next goal is 
the derivation of these bounds under some locality restrictions on an n-partite system for possible measurements. One easy 
setting is restricting the present measurement to be identical to that in the respective system. In this case, our task is the 
choice of the optimal measurement on the single system. By considering the measurement and the quantum state as the input 
and the channel, respectively, we can treat this problem by the non-adaptive method of the classical channel. Another setting 
is restricting our measurement to one-way local operations and classical communications (one-way LOCC). In the above- 
mentioned correspondence, the one-way LOCC setting can be regarded as the adaptive method of the classical channel. Hence, 
applying the above argument to discrimination of two quantum states, we can conclude that one-way communication does not 
improve discrimination of two quantum states in the respective asymptotic formulations. 

Furthermore, the same problem appears in adaptive experimental design and active learning. In learning theory, we identify 
the given system by using the obtained sequence of input and output pairs. In particular, in active learning, we can choose 
the inputs using the preceding data. Hence, the present result indicates that active learning does not improve the performance 
of learning when the candidates of the unknown system are given by only two classical channels. In experimental design, we 
choose suitable design of our experiment for inferring the unknown parameter. Adaptive improvement for the design is allowed 
in adaptive experimental design. When the candidates of the unknown parameter are only two values, the obtained result can 
be applied. That is, adaptive improvement for design does not work. 

The remainder of the present paper is organized as follows. Section [XT] reviews the Stein bound, the Chernoff bound, the 
Hoeffding bound, and the Han-Kobayashi bound in discrimination of two probability distributions. In Section [HI] we present 
our formulation and notations of the adaptive method in the discrimination of two (classical) channels, and discuss the adaptive- 
method versions of the Stein bound, the Chernoff bound, the Hoeffding bound, and the Han-Kobayashi bound, respectively. 
In Section IIVI we consider a simple example, in which the stochastic non-adaptive method is required for attaining the Han- 
Kobayashi bound. In Section [VJ we apply the present result to discrimination of two quantum states by one-way LOCC. In 
Sections I VII IVII1 and IVIIII we prove the adaptive-method versions of Stein bound, the Chernoff bound, the Hoeffding bound, 
and the Han-Kobayashi bound, respectively. 



II. Discrimination/simple hypothesis testing between two probability distributions 

In preparation for the main topic, we review the simple hypothesis testing problem for the null hypothesis Ho : P n versus the 
alternative hypothesis H\: P , where P n and P are the n-th identical and independent distributions of P and P, respectively 
on the probability space y. The problem is to decide which hypothesis is true based on n outputs j/i, . . . , y n . In the following, 
randomized tests are allowed as our decision. Hence, our decision method is described by a [0, l]-valued function / on y n . 
When we observe n outputs y±, . . . , y n , we accept the alternative hypothesis P with the probability f(yi, . . . , y n )- We have 
two types of errors. In the first type, the null hypothesis P is rejected despite being correct. In the second type, the alternative 
P is rejected despite being correct. Hence, the first type of error probability is given by Epn/, and the second type of error 
probability is by Ep-™(1 — /). Note that Ep describes the expectation under the distribution P. 

In the following, we assume that 

<S>(s\P\\P) :=J{fip(y))'P(dv)<oo 
</>(s\P\\P) :=log$(s|P||P) 

and (j>(s\P\\P) is C 2 -continuous. In the present paper, we choose the base of the logarithm to be e. In the discrimination of 
two distributions, we treat two types of probabilities equally. Then, we simply minimize the equal sum Ep»/ + Ep™(l — /). 
Its optimal rate of exponential decrease is characterized by the Chernoff bound[3]: 

C{P,P):= lim — log(minEp n / n + Ep»(l-/ n )) =- min <f>{s\P\\P). 

n^oo n f n 0<s<l 

In order to treat these two error probabilities asymmetrically, we often restrict the first type of error probability Epn / to below 
a particular threshold e, and minimize the second type of error probability Ep-™(1 — /): 

(3* n (e) :=min{E^(l-/) | E P »/ < e}. 
Then, the Stein's lemma holds. For < Ve < 1, the equation 

lim - log/3; (e) = -D(P\\P) (1) 

n— »oo n 

holds, where the relative entropy D(P\\P) is defined by 

D(P\\P) = J -\og^{y)P{dy). 
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Indeed, this lemma has the following variant form. Define 

-logEp»(l-/ n ) 



sup < 


lim 


{/»} 


t n — >oo 


inf • 


lim 


{/»} 


t n — >oo 



10gEpn(l-/„ 



lim Epr. /„ = 

n — > oo I 

lim E pn /„ < 1 L 



Then, these two quantities satisfy the following relations: 

B(P\\P) = B*(P\\P) = D{P\\P). 

As a further analysis, we focus on the decreasing exponent of the error probability of the first type under an exponential 
constraint for the error probability of the second type. When the decreasing exponent of for the error probability of the second 
type is greater than the relative entropy D(P\\P) , the error probability of the second type converges to 1. In this case, we 
focus on the decreasing exponent of the probability of correctly accepting the null hypothesis P. For this purpose, we define 

-logE pn /„ 



sup < 


lim 


{/»} 


L n — >oo 


inf < 


lim 


{/»} 


n— >oo 



lim -*°giWWn) >r 



log E P „ (!-/„) 



r -logEpn(l-/ n ) 
lim > r 



Then, the two quantities are calculated as 



B e (r\P\\P) = min_ D{Q\\P) = sup 

Q:D(Q\\P)<r 0<s<l 



-sr - (f>{s\P\\P) 



1 - s 



B*(r\P\\P)= min_ D{Q\\P) + r - D(Q\\P) = sup 

Q:D{Q\\P)<r s<0 

The first expressions of (fj) and <[3j are illustrated by Figs. Q] and |2] 



-sr - <f>(s\P\\P) 



(2) 
(3) 
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Now, we define the new function B{r): 

B e (r) 

Then, its graph is shown in Fig. [3] 



B e (r\P\\P)_ r < D(P\\P) 
-B*{r\P\\P) r>D(P\\P). 




p s. P .p(dy) ■■= ^ JD ii m (^(i/))'fW, 



Fig. 3. Graph of S e (r) 

In order to give other characterizations of (0, we introduce a one-parameter family 

j_sp 

<S>(s\P\\P) y dP 

which is abbreviated as P s . Then, since <f>(s) is C 1 continuous, 

D(P s \\P 1 ) = {s-l)4> l (s)-<t>{s) s 6 (-oo,l] (4) 
D(P \\P S ) = 4>(a) - s0'(O) 5 £ [0,oo). (5) 

Since 

d(s -!)</>'(*) -</>(s) 

= -0 (s) < 0, 

ds 

D(P s \\Pi) is monotonically decreasing with respect to s. 

As is mentioned in Theorem 4 of Blahut [2], when r < D(P\\P), there exists s r G [0, 1] such that 

min D(Q\\P) = D(P Sr \\P ). 

Q.D(Q\\P)<r 

Then, © and © imply that 

r = £(P Sr ||Pi) = (s r - l)^(s r ) - 0(s r ). 

Thus, we obtain another expression. 

min_ D(Q\\P) = min D(P S ||P). (6) 

Q:D(Q||P)<r se[0,l]:Z)(P,||P)<r 

On the other hand, 

d -sr - <f>(s\P\\P) _ -r+(s-l)(f>'(s)-(f>(s) _ D(P s \\Px) 

ds~ T~s ~ (1 - s) 2 ~ (1-s) 2 ' 

Since D(P s \\Pi) is monotonically decreasing with respect to s, 4- ~ sr ~^\ p W p ) — if and only if s = s r . The equation 



(7) 



min_ D(Q||P) = sup ^ (8) 

Q:.D(Q||P)<r 0<s<l 1 — S 



can be checked. 
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In the following, we present some explanations concerning 01. As is mentioned by Han-Kobayashi[10] and Ogawa- 
Nagaoka[18], when r := ^P^Px) > r > D(P\\P), the relation 

B* e (r\P\\P) = D(P Sr \\P ) 

holds, where s r S (— oo, 0] is defined as 

r = D(P Sr ||P0 = (s r - l)<P(s r ) - <t>{s r ). 

Thus, similar to (0 and (0, the relation 

min D(Q\\P) + r - D(Q\\P) = D(P Sr \\P) = sup ~ ST ~ M P H P ) ( 9 ) 

Q:£>(Q||P)<r s<0 L—S 

holds, where s r < is defined by D(P Sr \\P) = r[18]. 

As mentioned by Nakagawa-Kanaya[22], when r > ro, the relation 

min D(Q\\P) + r-D{Q\\P)=D(P_ 0o \\P)+r-D(P- 0o \\P)= min (D(Q\\P) + r - D(Q\\P)) + r - r 

Q:D(Q\\P)<r Q:D{Q\\P)<r a 

holds. This bound is attained by the following randomized test. The hypothesis P is accepted with the probability only when 
the logarithmic likelihood ratio takes the maximum value ro. Since D(P s \\Pi) < r, (0 implies that 

-sr - 6(s\P\\P) , -sr - 6(s\P\\P) , -sr - 6(s\P\\P) 
sup = lim = lim h r — r 

s <0 1 — S s<-oo 1 — S s<-oo 1 — s 

min {D(Q\\P)+r -D{Q\\P))+r-r . (10) 

Q:D(Q\\P)<r 

Remark 1: The classical Hoeffding bound in information theory is due to Blahut[2] and Csiszar-Longo[4]. The corresponding 
ideas in statistics were first put forward by Hoeffding [16], from whom the bound received its name. Some authors prefer to 
refer this bound as the Hoeffding-Blahut-Csiszar- Longo bound. 

On the other hand, Han-Kobayashi[10] gave the first equation of 01, and proved that this equation among non-randomized 
tests when ro > r > D(P\\P). They pointed out that the minimum m^Q.r}[Q\\p)< r D(Q\\P) + r — D(Q\\P) can be attained 
by Q satisfying D(Q\\P) = r. Ogawa-Nagaoka[ 18] showed the second equation of (0 for this case. 

Nakagawa-Kanaya[22] proved the first equation when r > ro. Indeed, as pointed by Nakagawa-Kanaya[22], when r > ro, 
any non-randomized test cannot attain the minimum minQ. D ^Q||-pj <r D(Q\\P) + r — D(Q\\P). In this case, the minimum 
m i n Q.£)(QI|P)< r D(Q\\P) + r — D(Q\\P) cannot be attained by Q satisfying D(Q\\P) = r. 

III. Main result: Adaptive method 

Let us focus on two spaces, the set of input signals X and the set of outputs y. In this case, the channel from X and y is 
described by the map from the set X to the set of probability distributions on y. That is, given a channel W W x represents 
the output distribution when the input is x G X. When X and y have finite elements, the channel is given by transition 
matrix. The main topic is the discrimination of two classical channels W and W. In particular, we treat its asymptotic analysis 
when we can use the unknown channel only n times. That is, we discriminate two hypotheses, the null hypothesis Hq : W n 
versus the alternative hypothesis ti\.W, where W n and W are the n uses of the channel W and W Then, our problem 
is to decide which hypothesis is true based on n inputs x\, . . . , x n and n outputs y\, . . . , y n . In this setting, it is allowed to 
choose the fc-th input based on the previous k — 1 output adaptively. We choose the fc-th input Xk subject to the distribution 
y ^ , i yk t ){ x k) on X. That is, the fc-th input Xk depends on k conditional distributions P k = (P^P 2 , . . . ,P k ). 
Hence, our decision method is described by n conditional distributions P™ = (P 1 , P 2 , . . . , P n ) and a [0, l]-valued function 
/„ on (X x y) n . In this case, when we choose n inputs x%, . . . ,x n and observe n outputs yi, . . . , y n , we accept the alternative 
hypothesis W with the probability f n (xi,yi, . . . , x n , y n ). That is, our scheme is illustrated by Fig. H] 

In order to treat this problem mathematically, we introduce the following notation. For a channel W from X to y and a 
distribution P on X, we define two notations, the distribution WP on X x y and the distribution W ■ P on y as 

WP{x,y) :=W x {y)P{x) 

W-P{x,y) := [ W x (y)P(dx). 
J x 

Using the distribution WP, we define two quantities: 

D(W\\W\P) := D{WP\\WP) 
<j){s\W\\W\P) := (j>(s\WP\\WP). 



Channel discrimination with 
adaptive improvement 



W or 


W 




W or 


W y 




Adaptive improvement 
is allowed 





W or yy 









Fig. 4. The adaptive method 



Based on k conditional distributions P k = (P 1 , P 2 , . . . , P k ), we define the following distributions: 



p - 
p 

r s,W\W,P n 



= WP n WP n ~ x ■■■WP X 

= P n ■ Q w pn-l 



= P n -Q s 



Then, the first type of error probability is given by Eq w pn /„, and the second type of error probability is by Eq_ pn (1 — /„). 
In order to treat this problem, we introduce the following quantities: 

C(W, W) := Jim -1 log( min E QwPn f n + E Q _ pn (1 - /„)) 

ft(e) := min {E Qw p Jl /„) | E Qwp J n < e}, 

P n ,fn 



and 



sup 


lim 


{(P n Jn)} 


n— s-cso 


inf ■ 


lim 


{{P n ,M} 


n— >oo 


sup < 


lim 


{(P n Jn)} 


n^oo 


inf < 


lim 


{(P n Jn)} 


n^oo 



-logE Qw _(!-/„) 



lis i^>, ,, /» i 



-logE Q _ pn (l-/„) 1 
lim 1 > r 



n 



log E Qw Pn (!-/„) 



-logE Q _ pn (l-/„) 1 
lim '■ > r 



We obtain the following channel version of Stein's lemma. 
Theorem 1: Assume that <p{s\ W x \\W X ) is C 1 continuous, and 

where 0(s|VK||VF) := sup xeX (j>{s\W x \W x ) = sup PeV{x) <p(s\W\\W\P), and V{X) is the set of distributions on X. 
Then, 

B(W\\W) =B*{W\\W) =D := sup D(W X \\W X ). 

xex 

The following is another expression of Stein's lemma. 



(11) 



(12) 
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Corollary 1: Under the same assumption, 

lim Zli 0g/ 3*( e ) = S up D(W X \\W X 

Condition (fTTT i can be replaced by another condition. 
Lemma 1: When any element x G X satisfies 

4>'(0\W X \\W X ) = D(W X \\W X ) 

and there exists a real number e > such that 

-/2 



d^(s\W x \\W x ) ^ 

Gi := sup sup J-, < oo, (13) 

xex se[-e,o] « s 



then condition ( fTTT i holds. 

In addition, we obtain a channel version of the Hoeffding bound. 
Theorem 2: When 



and 



then 



d 2 0( s \W x \\W x ) 

sup sup < oo (14) 

xex se[o,i] " s 



supD^HW,) < oo, 

xeX 



B e (r\W\\W) = sup sup — — yi^H^) = sup m in D(Q||W^). (15) 
Corollary 2: Under the same assumption, 

C(W,W) = sup - min 0(s|Wg|"W^). (16) 

0<S<1 

These arguments imply that adaptive improvement does not improve the performance in the above senses. For example, 
when we apply the best input x m '■= argmax x -D(Wx||Wx) to all of n channels, we can achieve the optimal performance in 
the sense of the Stein bound. The same fact is true concerning the Hoeffding bound and the Chernoff bound. 
Proof: The relation 



holds. Since 



C(W, W) = sup{r\B e (r\W\\W) > r} 
(s\W x \\W x 



f I sr - s W x W x \ 
sup< r\ sup sup > r > 

'xex o<s<i 1 — s > 

/ -sr-q>(s\W x \\W x ) X 

< r sup > r > 

n<s<i 1 — s > 



■ sup sup 

xex 



= sup- min <fi(s\W x \\W x ), 
xex °< s <! 

the relation ( fTol l holds. ■ 
The channel version of the Han-Kobayashi bound is given as follows. 
Theorem 3: When </>(s| W x \\W X ) is C 1 continuous, then 

o*, iwiiwn -sr-<t>(s\W\\W) . , -sr - <t>(s\W\\W\P) . . -sr - <j>(s\W\\W\P) 

B*(r \W \\W) = sup — - — -= mf sup — - — — = mf sup — - — — , (17) 

s <o 1-s Pev(x) s < 1-s PevHx) s < Q 1-s 

where T 2 {X) is the distribution on X that takes positive probability only on at most two elements. 
As shown in Section ITVl the equality 

sup — M = inf sup sr-^\Wx\Wx) (lg) 

does not necessarily hold in general. In order to understand the meaning of this fact, we assume that the equation (TT~8T > does not 
hold. When we apply the same input x to all channels, the best performance cannot be achieved. However, the best performance 
can be achieved by the following method. Assume that the best input distribution argmax P67 ,2 ( - A .' ) sup s<0 ~ sr ~'^(^r^ll w/ l- p ) 
has the support {x, x'}, and the probabilities A and 1 — A. Then, applying x or x' to all channels with the probability A and 
1 — A, we can achieve the best performance in the sense of the Han-Kobayashi bound. That is, the structure of optimal strategy 
of the Han-Kobayashi bound is more complex than those of the above cases. 



x 



IV. Simple example 

In this section, we treat a simple example that does not satisfy ( fl~8T >. For four given parameters p, q, a > 1, b > 1, we define 



the channels W and W: 



Then, we obtain 



W o (0):=aq, W (l):=l-aq, 

W o (0):=q, W (l):=l-<7, 

Wi(0):=bq, W^-l-bq, 

Wi(0):=q, Wx(l):=l-q. 



lim = a, 

:— >— oo S 



In this case, 



D(W \\W ) =aploga+(l - ap) log 
£>(Wi||Wi) =6glog6+ (1 -bq) log 



1 — ap 

l-P 
1-bq 

1-9 ' 



When a > 6 and D(W ||Wo) < £>(Wi||Wi), the magnitude relation between <£(s|Wo||W ) and t/>(s|Wi||Wi) on (-oo,0) 
depends on s € (— oo,0). For example, the case of a = 100, & = 1.5, p = 0.0001, q = 0.65 is shown in Fig. [5] In this case, 
S*(r|W ||Wo), B;(r\Wi\\Wi), and B*(r\W\\W) are calculated by Fig. [6] Then, the inequality (H does not hold. 



1 

. 8 
„ . 6 

CO 

0.4 
0.2 





-1 



-0.8 -0.6 -0.4 -0.2 



Fig. 5. Magnitude relation between </>(s|Wo|| Wo) and 0(s|iyi||Wi) on (—1,0). The upper solid line indicates 0(s|Wrj|| Wo), the dotted line indicates 
<f>(s\Wi\\Wi). 



. 5 




r 



Fig. 6. Magnitude relation between B*(r|Wo||VKo), B* (r\W\ \\Wi), and Bg(r|W||W) on (-1,0). The upper solid line indicates B*(r\ Wq\\ Wo), the 
dotted line indicates BJ(r|Wi||TVi), and the lower solid line indicates B*(r|W||iy). 
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V. Application to adaptive quantum state discrimination 

Quantum state discrimination between two states p and a on a d-dimensional system H with n copies by one-way LOCC is 
formulated as follows. We choose the first POVM M\ and obtain the data y\ through the measurement M\. In the fc-th step, we 
choose the fc-th POVM Mk((Mi, y\), . . . , (M fe _i, j/k-i)) depending on {M\,y\), . . . , (Mk-i,Vk-r)- Then, we obtain the fc-th 
data yk through Mfc((Mi, yi), . . . , (Mk-i, yt-i))- Therefore, this problem can be regarded as classical channel discrimination 
with the correspondence Wm(v) = Tr M(y)p and Wm(v) — Tr M(y)a. That is, in this case, the set of input signal corresponds 
to the set of extremal points of the set of POVMs on the given system Ti. The proposed scheme is illustrated in Fig. [7] 



One-way adaptive improvement 



P or <T- 



P or a 



P or O- 



Measurement M 



Measurement M 



Adaptive improvement 
is allowed 



-► Measurement M 




Fig. 7. Adaptive quantum state discrimination 

Now, we assume that p > and a > 0. In this case, X is compact, and the map (s, M) — > d $( s \Wm\\w m ) j s con tinuous. 
Then, the condition (fl~3T > holds. Therefore, one-way improvement does not improve the performance in the sense of the Stein 
bound, the Chernoff bound, the Hoeffding bound, or the Han-Kobayashi bound. That is, we obtain 

B(W\\W)=B*(W\\W)= max D(Pf||P,f) 

A/:POVM 9 

r-d>(s\pM\\pM) 



BJr\W\\W) = max sup 

M:POVM < S <! 1 — S 

nv -.sr-maxM^ovM^I^H^ 1 ) 

^e( r FF) =SUp . 

s<0 1 - S 

Therefore, there exists a difference between one-way LOCC and collective measurement. 



VI. Proof of the Stein bound: (1121 ) 
Now, we prove the Stein bound: dl2l . For any x E X, by choosing the input x in n times, we obtain 

B(W\\W) > D{W X \\W X ). 

Taking the supremum, we have 

B{W\\W) > supD^HWx). 
Furthermore, from the definition, it is trivial that 

B(W\\W) < B*(W\\W). 
Therefore, it is sufficient to show the strong converse part: 

B*(W\\W) <D. 

However, in preparation for the proof of (fT5l l, we present a proof of the weak converse part: 

B(W\\W) <D 



(19) 



(20) 
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which is weaker argument than ( fT9] l, and is valid without assumption ( fTTT i. In the following proof, it is essential to evaluate 
the KL-divergence concerning the obtained data. 
In order to prove d20l i, we prove that 

I5I--logE Q (l-/ n )<7J (21) 

n—>OG 77, w ' H 

when 

E Qw , P „/„-0. (22) 
It follows from the definitions of Q w p n and Q-^ p n that 

n 

D(Q w ,p4Qw,pn) = '£ l D(W\\W\P Wt p k ). 
fe=l 

Since — Eq w pn f n logEQ_ pn f n > 0, information processing inequality concerning the KL divergence yields the following: 

- HE QwiPn (1 - /„)) - (E QwPn (1 - /„)) logE Q _ pn (1 - /„) 
<Eq w ^„ (1 - AOOogEg^ (1 - /„) - logE Q _ pn (1 - /„)) + E QwP J n {\ogE QwP J n - logE Q _ p J n ) 

n 

<D(Q wPn \\Q wPn ) =J2 D (W\\W\P wPk ) < nD. (23) 



k=l 



That is, 



1 D+ iWE a (1 - /„)) 

■- logEo_ s (1 - /„) < 71 V Q "; F " V N — . (24) 



Therefore, ( 1221 yields (|2TT >. 

Next, we prove the strong converse part, i.e., we show that 



when 



Since 



EQ wi3 „(l-/n)-0 (25) 



-logE _ „ (1 - /„) 
r := lim J J± > /J. (26) 

n — >oc ^ 



^(sIQ^p-IIQw.p") 



we obtain 



^IQ w ,p„IIQh7,p„) = ^IQ^-xIIQ^-O + ^I^II^I^iw.p")' (27) 

Applying <l27l > inductively, we obtain the relation 

n 

^(s\Q w< p n \\Q w< p n ) = Y.^ s \ W \\ W Kw\w^) < ncj>(s\W\\W). (28) 

k=l 

Since the information quantity (f>(s\P\\P) satisfies the information processing inequality, we have 

(EQ„, fS „(l-/n)) 1 - S (EQ w , p „(l-/„)) S 

<(E QwPn (i - /„)) 1 - s (e q _ p „(i - f n )y + {v QwP j n y-°{v Qwp j n y 

<g^( s IQ W ,f3«ll ( 3w,f3") 

< n<t>{s\W\\W) 

for s < 0. Taking the logarithm, we obtain 

(1 - s)logE Qwi3n (l - f n ) < -alogE^ (1 - /„) +n<p(s\W\\W). (29) 
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That is, 



— logEQ w . P „(l-/n) > ■ 

When lim IWOO ~ 1 ° sE ~ T " (W " ) > r, the inequality 

B-MWW) > M ^o g E QwpA l-f n ) > ^ffg 

n — >oo *^ IS 

holds. Taking the supremum, we obtain 

-sr - (j){s\W\\W) 



B*(r\W\\W) > sup- 

s<0 1 — S 



From conditions (fTTT i and d26| i, there exists a small real number e > such that r > ^ _ e 11 — '-. Thus, 

-sr - (j){s\W\\W) er - (j){-e\W\\W) 
sup ~ ^ ~ ^ 0. 

s<o 1 ~ s 1 + e 

Therefore, we obtain (l25l l. 

Remark 2: The technique of the strong converse part except for (l28T > was developed by Nagaoka [19]. Hence, deriving < 
can be regarded as the main contribution in this section of the present paper. 

Proof of Lemma Q~| 

It is sufficient for a proof of (fTTl i to show that the uniformity of the convergence ikz^iy^sA^sJ —D(y/ X \\W x ) — > concerning 
x e X. Now, we choose e > satisfying condition (Qj]l. Then, there exists s G [-e, 0] such that itz^MEA -D(W X \\W X ) = 
\e4>(s\W x \\W x ) < ^e. Therefore, the condition (TT) holds. 

VII. Proof of the Hoeffding bound: ([131 ) 
In this section, we prove the Hoeffding bound: ( fl5l l. Since the inequality 

B e (r|W||W) > sup sup - sr -<l>( 8 \ w *\\ w *) = sup mi _n D(Q||W X ) 

x£X0<s<l 1 — S kGAT Q:_D(Q||W a; )<r 

is trivial, we prove the opposite inequality. In the following proof, the geometric characterization Fig. Q] and the weak and the 
strong converse parts are essential. Equation © guarantees that 

sup mm D(Q\\W X ) = sup mm _ D(P s w w 

WW*). 

xeX Q:D(Q\\W*)<r xeX se[0,l]:D(P tiW;l>iWx \\W x )<r ' 

For this purpose, for arbitrary e > 0, we choose a channel V : V x — P s / X \ ff ^ by 

s(x) := argmin ^ (-P^v^.wJI W*)- 

se[0,l]:D(P, tWxtW JW x )<r 



Assume that a sequence {(P n ,f n )} satisfies 



lim log E Q (1 - /„) = r. 

n — >-oo T) w ,t- 



71 — >-00 fl 

By substituting V into W, the strong converse part of the Stein bound:d25l> implies that 

limE Qv ^(l-/ n )=0. 
The condition ( fT~3l > can be checked by the following relations: 

m\p.M,w m w*) = (1 _ s{xW{s{x){l _ t) + , rxl | Wx) (30) 

d 2 Ht\p six) ^ w jw x ) = (i _ a(jB)) _ t) + t|wg|F ^ (31) 

Thus, by substituting V and into W and PF, the relation d24l i implies that 

En --logE Q (l-/„) < sup D{V X \\W X ). 
Similar to (f30b and (l3T1 l. we can check the condition (Tl~3T >. 
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From the construction of V, we obtain 



lim --logEg (1 -/„) < max min D(Q\\W X ). 

™^°° n n ' F x Q : D(Q\\W x )<r-e 

The uniform continuity guarantees that 

lim --logE Q (1 - /„) < max min D(Q\\W X ). 

n->oo n w ^ x Q;D(Q\\W x )<r 

Now, we show the uniformity of the function r t— > sup 0<s<1 ~ sr ~^\w x \\Wx) concernm g x As mentioned in p. 82 of 



Hayashi[ll], the relation 



holds, where 



Since 



we have 



d -sr-4>{s\W x \\W x ) s r 

T SU P 1 = T 

dr < s <i l-s s r - 1 

-sr - <f>(s\W x \\W x ) 
s r := argmax . 

0<s<l 1 — s 



d -sr - 6(s\W x \\W a 



dr 1 — s 



= 0, 

S — 8 r 



r = {s r - l)0'(s r \W x \\W x ) - <f>{s r \W x \\W x ). 
Since -<j)(s r \W x \\W x ) > 0, (s r - 1) < 0, and <f>"(s\ W x \\ W x ) > 0, 

r > (s r - lW{s r \W x \\W x ) > {s r - 1)4>'(1\W X \\W X ) = (1 - s r )D(W x \\W x ). 

Thus, 



Hence, 



=J- > (1 - S r ). 

D(W X \\W X ) ~ 



s r , < 1 < D(W X \\W X ) < su Vx D(W x \\W x ) 



S r — 1 1 — S r 



Therefore, the function r i— > sup 0<s<1 — — ^M^kll^g) j s uniform continuous with respect to x. 

VIII. Proof of the Han-Kobayashi bound: ( fT7T > 

The inequality 



5 e (r|^||W)>su P ^-^H). (32) 

has been shown in Section I VII and the inequality 

-sr - ch(s\W\\W\P) 

BJr\W\\W)< inf sup— n 1 " 1 ' 

can be easily check by considering the input P. Therefore, it is sufficient to show the inequality 

. r -sr - (f>(s\W\\W\P) -sr - ct>{s\W\\W) . , -sr - 4>(s\W\\ W\P) 

ml sup ■ — - < sup — sup ml ! — -. (33) 

Pev 2 (x) s < 1-s s <o 1 - s s <o PeP 2 (A-) 1-s 

This relation seems to be guaranteed by the mini-max theorem (Chap. VI Prop. 2.3 of [5]). However, the function - sr -<t>(s\w\\w\P) 
is not necessarily concave concerning s while it is convex concerning P. Hence, this relation cannot be guaranteed by the 
mini-max theorem. 
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Now, we prove this inequality when the maximum max s <o sr S^ifflOQ exists. Since 0(s|Wa:||W x ) is convex concerning 
s, <j)(s\W\\W) is also convex concerning s. Then, we can define 

dUislWWW) := to. ^ + ^llW)-^||W) 

Hence, the real number s r := argmax s<0 zfl^MSOf] satisfies that 

(1 - s r )9-0(s r |W||F) + 0(s r |W||W) < -r < (1 - s r )9 + 0(s r |W A ||W 7 ) + 4>{s r \W\\W). 
That is, there exists A G [0, 1] such that 

-r = (1 - * r )(A0+^(8 r |W||W) + (1 - A)a - ^(« r |W||W)) + 0(s r |W||TF). ( 34 ) 
For an arbitrary real number 1 > e > 0, there exists 1 > 6 > such that 

0(s + £|W||TF) -<£(s|W||W) 



_ 6 _ 

0(s|W||W)-0(s-5|W||W) 



< 9+0(s|V^||V^) + e (35) 
> d-(j}(s\W\\W) - e. (36) 



S 

Then, we choose x + , x~ £ X such that 

0(s r + XS\W\\W) ~Se< 4>{s r + XS\W X + \\W X +) < (f>{s r + XS\W\\W) (37) 
4>{s r - (1 - X)5\W\\W) -5e< <j>(s r - (1 - X)5\W X - \\W X -) < 0(s r - (1 - X)6\W\\W). (38) 

Thus, (|37| | implies that 

+ A5|W X+ ||W X+ ) - 0(s r - (1 - X)S\W X+ \\W X+ ) 



>- 



___ s 

(s r + A<J|W||W) -Se- 4>(s r - (1 - A)<J|W||W) 



> 



> 



6 _ 

</>(a r + AjjWfKQ - <j>(s r + + ^»(s r + \W\\W) - <j>(s r - (1 - A}£|W]|W] - Se 

_ s~2 

X5d+(j){s r \W\\W) + (1 - X)6(d-(j>(s r + |W]|W} - e) - 6e 
_ S _ 

=Xd + (/)(s r \W\\W) + (1 - A)<r0(s r + |W||W) - e. (39) 

Similarly, d38l implies that 

0(gr + Ajigv ||wy) - <t>{s r - (i - a)^- ||wy) 

_ 6 _ 

<Xd + 4>(s r \W\\W) + (1 - A)d^(s r + |W||W0 + e - ( 4 °) 

Therefore, there exists a real number A' € [0, 1] such that 

¥>(s r + A<5|A') - V3(s r - (1 - A)5|A') 



(Aa+0(s r |VF||T^) + (1 - X)d~(j){s r + \W\\W)) 



6 

<e. (41) 

where 

<p{a\\') := AV(s|W^+ + (1 - A>( S |^- HTF,-). 

Thus, there exists s r G [s r — (1 — A)<5, s r + XS] such that 

\(p'(s r \X') - (Aa + 0(s r |t¥||ly) + (1 - A)S _ ^(s r |W||W))| < e. (42) 
The relation (RTt also implies that 

<(p(s r - (1 - A)5|A') - 95(s r |A') < ip{s r - (1 - A)5|A') - tp{s r + XS\X') 
<[e - ((A9 + 0(s r |W||F) + (1 - A)fl - ^(« P |W||W))]J 

<(e-0 _ 0(s r |W||W))& (43) 
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Since 

<t>(s r - (l-A)*|W r B+ ||W a! +) > </>(s r + X6\W x+ \\W x+ ), 
relations ( f36l > and ( f37l > guarantee that 

<4>{s r - (1 - X)5\W\\W) - <j>(s r - (1 - X)5\W X+ \\W X+ ) 
<<f>(s r - (1 - A)£|W||W) - <j){s r + A^WHW 7 ) + 4>{s r + X5\W\\W) - <j>{s r + X5\W X + \\W X +) 
<(e - d-(j){s r \W\\W))(s r + X5- s r ) + 5e 
<(e - d-(j>(s r \W\\W))5 + Se= (2e - 0-0(« r |W||W))& 

Therefore, 

<</>0r - (1 - A)<J|W||W) - <p(s r - (1 - A)5[A') 
<A'(0( Sr - (1 - X)S\W\\W) - 0(s r - (1 - A)<5|W X+ \\W X+ )) + (1 - A')(0(«r - (1 - A)*|W||W) - <P(s r - (1 - A)5|W X - ||WV)) 
<A'(e - 9-^(s r |W||W))5 + (1 - X')5e < (e - 9-0(a r |W||W))& (44) 

Since ( f36b implies that 

0(s r - (1 - A)<5|W||W) - 4>{s r \W\\W) < (e - - </>(s r |V7||W))<J, 
relations d43b and d44l) guarantee that 

\tp(s r \X') - <j)(s r \W\\W)\ 

<\<p(s r \X') - (p(s r - (1 - A)<5|A')| + \<p{s r - (1 - A)<5|A') - <£(s r - (1 - A)*|W||W)I + \<t>( s r - (1 - A)*|W||W) - 0(*r|W||W)| 
<(4e - 3d-(f>(s r \W\\W))S < C 2 <5, (45) 

where 

C 2 := 4 - 30 - ^(« r |W||W)) > 4e - 30 - 0(s r |W||W). 

Note that the constant Ci does not depend on e or S. 

We choose a real number r := (1 — s r )cp(s r \X') + (p'(s r \X'). Then, (05J, d42l >. and the inequality \s r — s r | < 5 imply that 

|f — r| 

<|(1 - s r Ms r \X') - (1 - s r )0(s P ]W||W))l + Ip'W) - (Aa+0( Sr |^||W) + (1 - X)d-4>(s r + \W\\W))\ 

<|(1 - 3 r )(y>(3 P |A') - <A(s r |W||W))| + W*r|W||W)(s r - s r )| + |^(3 r |A') - (A9+0( Sr |M/||W) + (1 - A)9"^( Sr + 

<(l-Sr)C 2 *+|^(« r |W||W)|<y + e< C 3 5 + e, (46) 

where 

C 3 :=(2-s r )C 2 + \<f>(s r \W\\W)\ 

>(1 - s r + (1 - A)5)C 2 + |^(*r|W||W)| 
>(l-3r)C 2 + |0(«r|W||W)|. 

Note that the constant C3 does not depend on e or <5. The function ~ sr ~v-K s l A ) takes the maximum at s — s r . Using (PEBT l and 
(|46T >. we can check that this maximum is approximated by the value — as 

— s r r — ip(s r \X') —s r r - 4>(s r \W\\W) 



<\ 



-s r r - ip(s r \X') -s r r - 4>(s r \W\\W) . . -s r r - <f>(s r \W\\W) -s r r - cj>(s r \W\\W) , 



s r r - s r r Ms r \X') ~ (j)(s r \W\\W) , -s r r - <K s r|W"|| W)(s r - s r ) 



1 — s r 1 — s r (1 — s r )(l — s r ) 

s r (r-r)| + |r(s r -s r )| . <p(s r \X') - tf>(s r \W\\W) -s r r - <f>(s r \W\\W) . 



1 — s r 1 — s r (1 — s r + 1)(1 — s r ) 

2^ + + l (2- ar )(i- Br) 

<C 4 e + C 5 6, (47) 
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where we choose C4 and C5 as follows. 

-s r + 1 C 2 



C 4 :=- 



> 



Cn := 



> 



-s. r + 5 C 2 . 



(-S r + l)C 3 +^ + -S r T - 4>{s r \W^f) 

2 — s r (2 — s r )(l — s r ) 

(-s r + 5)C 3 + rS . -s r r-<t>(s r \W\\W) . 



2 — s r (2 — s r )(l — s r ) 

Note that the constants C4 and C5 do not depend on 6 or e. Since 

-sr-y>(s|A') -sr - y;(s|A') . -s . 

I — ; 1 1 < 1 — k-n < r~ r > 

1 — s 1 — s 1 — s 

d46l i implies that 

I max _ ^ -sr-tp{s\\') <]r ^ n< s e 

1 s<0 l-S s<0 1 - s ' - ' ' - 

Since <p(s\X) < <j>(s\W\\W), (EHJ and d47> guarantee that 

< max ZfLZ^g) _ -8 r r- <Ks r \W\\W) < + + + 
s<0 1 — s 1 — s r 

We define the distribution Py G 7 ,2 (^) by 

Pv(z+) = A', P v (^) = l-A'. 
Since the function x — > log a; is concave, the inequality 

fp(*|A') < 0(s|W||W|Pv) (50) 

holds. Hence, (|49j and d50j imply that 

-sr-0(s|W||W|P) -s r r-0(s r |^||W F ) 



< inf 



max ■ 



Pev 2 (x) s<o 1 — s 1 — s r 

-sr - cp{s\W\\W\Py) -s r r - <j>(s r \W\\W) 
< max < (64 + l)e + (C 3 + C 5 )o. 

s<0 1 — S 1 — S r 



We take the limit 6 — > +0. After this limit, we take the limit e — * +0. Then, we obtain (l33l l 

1-s 



Next, we prove the inequality (|33l l when the maximum max s <o sr mllOjQ (j oes no t exist. The real number R 



lim^-oo ^£imQ satisfies r > —P. Thus, 

-sr - </>(s r \W\\W) 
sup = r + R. 

For any e > 0, there exists sq < such that any s < sq satisfies that 

R< </>(s \W\\W) - </>(s\W\\W) <R | c 
~ so - a ~ 

We choose xq such that 

0(s o - 1|W[|W) - e < <f>(a - l|W»oll^*o) < <K s o ~ 1\W\\W). 

Thus, 

0(»o|W^ \\W X0 ) - <j>(s - l\W Xo \\W Xo ) < <j>{s \W\\W) ~ 0(so - l\W\\W) +e<R + 2e. 
Hence, for any s < sq, 

<P(sq\W x JW Xq ) - 0(s\W Xo \\W Xo ) 
s - s 

<<P(s Q \W X0 \\W X0 ) - 4>{a - 1|W X0 ||W X0 ) 
<<j){s \W\\W) - cf>(a - l\W\\W) + e<R + 2e. 
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Thus, 



-r <R< lim 



4>(s\W X0 \\W X0 ) 



<R + 2e. 



oo 



•S 



Therefore, 



sup 

s<0 



sr - cp(s r \W Xo \\W X0 ) 



<r + R + 2e. 



1 - s 



Taking e — > 0, we obtain ( T33T >. 



IX. Concluding remarks and future study 



We have obtained a general asymptotic formula for the discrimination of two classical channels with adaptive improvement 
concerning the several asymptotic formulations. We have proved that any adaptive method does not improve the asymptotic 
performance. That is, the non-adaptive method attains the optimum performance in these asymptotic formulations. Applying the 
obtained result to the discrimination of two quantum states by one-way LOCC, we have shown that one-way communication 
does not improve the asymptotic performance in these senses. 

On the other hand, as shown in Section 3.5 of Hayashi[ll], we cannot improve the asymptotic performance of the Stein 
bound even if we extend the class of our measurement to the separable POVM in the n-partite system. Hence, two-way LOCC 
does not improve the Stein bound. However, other asymptotic performances in two-way LOCC and separable POVM have not 
been solved. Therefore, it is an interesting problem to solve whether two-way LOCC improves the asymptotic performance 
for other than the Stein's bound. 

Furthermore, the discrimination of two quantum channels (TP-CP maps) is an interesting related topic. An open problem 
remains as to whether choosing input quantum states adaptively improves the discrimination performance in an asymptotic 
framework. The solution to this problem will be sought in a future study. 
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